Resiliently Retaining State Information Of A Many-Core Processor

ABSTRACT

In one embodiment, the present invention includes a method for performing dynamic testing of a many-core processor including a plurality of cores, manipulating data obtained from the dynamic testing into profile information of the many-core processor, and storing the profile information in a non-volatile memory. The non-volatile memory may be within the many-core processor, in some embodiments. Other embodiments are described and claimed.

This application is a divisional of U.S. patent application Ser. No.11/387,385 filed Mar. 23, 2006 entitled “RESILIENTLY RETAINING STATEINFORMATION OF A MANY-CORE PROCESSOR,” the content of which is herebyincorporated by reference.

BACKGROUND

Embodiments of the present invention relate generally to processors, andmore particularly to processors including multiple cores such asmany-core processors.

A many-core processor includes multiple processing cores on one or moredie, typically on a single die. As process technologies scale to verysmall dimensions, the prevailing design approach of achieving highperformance by increasing processor frequency is limited due toincreased power consumption. One alternative approach to achieve highperformance is to distribute an application across many “small” coresthat run concurrently at slower speeds than a typical “larger” core.Because each “small” core is simpler, smaller and far less power hungrythan a “large” core while still delivering significant performance, amany-core design can help manage power consumption more efficiently thana single or large-core design.

Although a many-core processor has advantages over a processor with asingle core or a few large cores, it also faces many challenges asprocess technologies scale down. For example, process variations, eitherstatic or dynamic, can make transistors unreliable; transient errorrates may be high since capacitance on storage nodes is small andvoltages are low; and reliability over time may deteriorate astransistor degradation becomes more severe as years pass. Thus one-timefactory testing and bum-in, as implemented for conventional processors,becomes less effective to ensure reliable computing over time with amany-core processor.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a many-core processor in accordance withone embodiment of the present invention.

FIG. 2 is a block diagram of a many-core processor in accordance withanother embodiment of the present invention.

FIG. 3 is a flow diagram of a method in accordance with one embodimentof the present invention.

FIG. 4 is a flow diagram of a method for using profile informationstored in a non-volatile memory in accordance with an embodiment of thepresent invention.

FIG. 5 is a block diagram of a multiprocessor system in accordance withan embodiment of the present invention.

DETAILED DESCRIPTION

Embodiments of the present invention may use a non-volatile memory toresiliently store profile information of a many-core processor. Themany-core processor may include a large number of small cores situatedon a single die of a semiconductor package. Further, in variousimplementations the non-volatile memory may also be situated on the samedie as the cores. The many-core processor may be dynamically tested,e.g., via self-testing to obtain profile information for storage in thenon-volatile memory.

As will be described further below, various profile information may bestored in the non-volatile memory. In different embodiments, suchprofile information may include frequency and voltage informationregarding cores, as well as dynamic information. Additional resilientstate of the many-core processor may be further stored in thenon-volatile memory. Such resilient state information may includeperformance information, as will be discussed further below. Stillfurther, task allocation information regarding tasks allocated tovarious operating cores can be stored. To aid in such tasks, thenon-volatile memory may further store a configuration of an interconnectfabric that couples the operating cores together. Of course, additionalprofile information may be stored in the non-volatile memory indifferent embodiments.

In various embodiments, cores of a many-core processor may beperiodically tested to obtain and/or refresh their dynamic profiles. Thedynamic profile of a core may include information on its maximumoperating frequency, power consumption, power leakage, and functionalcorrectness, among other parameters. The dynamic profile may alsoinclude trending information of these parameters, indicating reliabilityof a corresponding core over time.

In various embodiments, tasks may be allocated and dynamicallyreallocated to cores based on current core profiles. If these profileschange during execution, e.g., as a result of updated profileinformation, task allocation may be dynamically changed to maintain adesired performance level. Thus this task allocation information mayalso be stored as part of the resilient state and any updates to thistask allocation mapping may also be stored in non-volatile memory.

Referring now to FIG. 1, shown is a block diagram of a many-coreprocessor in accordance with one embodiment of the present invention. Asshown in FIG. 1, processor 10 includes a plurality of individual cores15. More specifically, the embodiment of FIG. 1 shows a configurationthat includes an 8×8 array of cores coupled via an interconnect fabric30. While shown with this particular implementation in the embodiment ofFIG. 1, it is to be understood that the scope of the present inventionis not so limited, and in other embodiments other configurations may bepresent, such as one-dimensional, two-dimensional or three-dimensionalmeshes or one-dimensional, two-dimensional, or three-dimensional torusconfigurations, for example. Further, while shown with 64 individualcores in the embodiment of FIG. 1, it is to be understood that many-coreprocessors may include more or fewer such cores in differentimplementations.

Each core 15 may be a relatively small core, at least compared withsingle core or dual-core processors. In various embodiments, each core15 may include a local memory (e.g., a cache memory) and further may becoupled to shared memory. Specifically, as shown in FIG. 1, a sharedmemory 20, which is a global shared memory, may be coupled to individualcores 15 via interconnect fabric 30. While not shown in FIG. 1 for easeof illustration, it is to be understood that processor 10 may includeother components, such as input/output (I/O) interfaces, interconnects,buses, logic and the like.

Cores 15 may be selected for activation based on various algorithms. Toeffect such activations, interconnect fabric 30 may also be configurableso as to enable improved connectivity between activated cores 15,increasing communication speeds. In the embodiment of FIG. 1, resilientstate data regarding the various cores 15 may be stored in non-volatilememory present within the cores themselves. Alternately, a non-volatilememory may be located within a processor but outside the cores of theprocessor. However, in other embodiments the resilient state data may bestored in a non-volatile memory external to processor 10.

Referring now to FIG. 2, shown is a block diagram of a many-coreprocessor in accordance with another embodiment of the presentinvention. As shown in FIG. 2, processor 50 may include similarcomponents to those discussed above regarding FIG. 1. Specifically, aplurality of cores 15 may be coupled via an interconnect fabric 30.Furthermore, a shared memory 20 may be present. However, in theembodiment of FIG. 2, a non-volatile memory 40 may be located withinprocessor 50. Non-volatile memory 40 may be used to store resilientstate data regarding cores 15. While shown in the embodiment of FIG. 2as being implemented on the same die as cores 15, non-volatile memory 40may be located within a package of processor 50 but on a separate die,in other embodiments. Of course, other implementations are possible.

Referring now to FIG. 3, shown is a flow diagram of a method inaccordance with one embodiment of the present invention. As shown inFIG. 3, method 200 may be used to obtain profile information and storethe obtained information in a non-volatile memory. Method 200 may beginby performing dynamic testing on the cores of a many-core processor(block 210). Such dynamic testing may take various forms. For example,at regular intervals a dynamic testing process may be initiated in whichneighboring cores test the capabilities of other neighboring cores.Alternately, one or more cores of the many-core processor may beselected as dedicated (i.e., checker) cores for performing such dynamictesting. In this way, the many-core processor is capable of self-testingto determine its operating capabilities.

As discussed above, various parameters may be determined based upon thetesting. For example, voltage and frequency values such as maximumoperating frequency and operating voltage may be determined.Furthermore, functional correctness of cores may be determined, e.g., byperforming one or more operations in multiple cores and comparing theresults. If the results differ, one of the cores may be indicated asfailing the functional correctness test.

Still referring to FIG. 3, next it may be determined whether the dynamictesting is the original dynamic testing on the many-core processor(diamond 220). This original dynamic testing may correspond to testingperformed upon an initial power up of a system including the many-coreprocessor. If the testing is the original dynamic testing, controlpasses to block 230. There, the test data may be manipulated intoprofile information (block 230). Various manners of manipulating thedata are possible. For example, test data regarding operating speeds ofthe cores may be manipulated into a so-called bin value. That is, eachcore may be partitioned into one of a selected number of bins based onits maximum operating speed. Accordingly, the cores of the many-coreprocessor may be segmented into multiple bins, for example, a fast bin,a medium bin, and a slow bin. Furthermore, any failed cores that areunable to further operate may be placed in a failed bin. In addition tosuch bins for speeds, operating cores may also be segmented into activecores and spare cores, where the active cores may be selected foroperation according to a particular configuration, while the spare coresmay remain in a spare pool for later configuration to the active state,e.g., when one or more of the active cores later fails. In this way,lifetime reliability of the many-core processor may be enhanced.

Note that the mix of cores in different bins may be tuned to obtainbetter control of the number of cores in a particular bin. The total mixof cores may be recorded as part of the resilient state. Over time, oneor more cores may be moved from a fast bin to a slower bin due todegradation in performance. However, the total performance of themany-core processor may be maintained by adjusting a mix of cores oradjusting operating parameters of one or more cores. For example, aspare core may be added to a fast bin. Alternately, an existing core maybe run at a higher frequency using voltage scaling or body biastechniques. In some embodiments, voltage, bias values, and frequency ofeach core may also be recorded as part of the resilient state.

Still referring to FIG. 3, next the profile information may be stored innon-volatile memory (block 240). As described above, this non-volatilememory may be configured in various manners. For example, a singlesubstrate including the cores may further include non-volatile memoryfor storage of the profile information. In one such implementation, eachcore may include a portion of the non-volatile memory to store itsprofile information. Yet in other embodiments, a package of themany-core processor may include a separate substrate having thenon-volatile memory. Still further, the non-volatile memory may be aseparate component of a system including the many-core processor, forexample, a flash memory such as a basic input/output system (BIOS), readonly memory (ROM) or another non-volatile storage of the system. Fromblock 240, control passes back to block 210, discussed above.Accordingly, method 200 may continue to perform dynamic testing duringnormal operation of a system including the many-core processor.

Referring back to diamond 220 of FIG. 3, if instead it is determinedthat the dynamic testing is not the original dynamic testing, controlpasses to block 250. There, the non-volatile memory may be updated withchanged information (block 250). For example, if the results of thedynamic testing indicate that the operating parameters of one or morecores has changed, e.g., operation at a lower frequency or voltage, orfailure of a functional correctness test, the profile informationcorresponding to the changed information may be updated in thenon-volatile memory. Then, control passes back to block 210, discussedabove. While described with this particular implementation in theembodiment of FIG. 3, it is to be understood that the scope of thepresent invention is not so limited, and other manners of obtainingprofile information and storing the obtained information in anon-volatile storage may be performed in other embodiments.

Referring now to FIG. 4, shown is a flow diagram of a method for usingprofile information stored in a non-volatile memory in accordance withan embodiment of the present invention. As shown in FIG. 4, method 300may begin upon initialization of a processor, e.g., when a systemincluding the processor is powered up. Accordingly, a power on signalmay be received (block 310). This signal, when received in theprocessor, may cause a reset of the many-core processor (block 320).Such a reset may reset various resources of the processor, includingeach of the cores of the processor and the resources thereof including,for example, registers, execution units, buffers, caches and the like.

Still referring to FIG. 4, next profile information stored in thenon-volatile memory may be accessed after reset has been performed(block 330). This access may seek profile information from wherever thenon-volatile memory is located. For example, a control core may includelogic or may be programmed to perform the request for the profileinformation. In some embodiments, such a control core may accessnon-volatile memory that is on the same die or in the same package asthe cores. Or the non-volatile memory may be otherwise located in thesystem. In still other embodiments each core may access its own profileinformation that is stored in a non-volatile storage of the core itself.

In any event, the cores may be configured based on the profileinformation (block 340). For example, cores of one or more performancebins may be enabled. Furthermore, an interconnection fabric betweenenabled cores may be configured to provide for improved communication(also block 340). For example, the interconnection fabric may bedynamically configured to provide optimal data transfer between activecores based on the relative location of these cores.

After such configuration, normal operation of the many-core processormay be entered. Accordingly, various processes may be performed in oneor more cores of the many-core processor. During such normal operation,at a selected time interval or upon an indication of a failure ordegraded performance in one or more of the cores, dynamic testing of thecores may be performed, as described herein.

As a result of such testing, changes to the many-core processor, such asavailability of cores, maximum operating speed of one or more cores orsimilar such changes may be identified. Accordingly, still referring toFIG. 4, at diamond 350 it may be determined whether an indication of achange to the many-core processor has been received (diamond 350). Forexample, such an indication may be initiated upon an update to thenon-volatile memory with updated profile information. If no suchindication is received, normal operation of the many-core processorcontinues.

If instead at diamond 350 an indication of a change to the many-coreprocessor is received, control may pass to block 360. There, the updatedprofile information may be accessed from the non-volatile memory (block360). The updated information may be readily identified, for example, byassociation of an update flag with the updated profile information. Theupdated profile information may thus provide trending information viacomparison of parameters obtained during a current dynamic test andthose obtained from previous testing. Reliability of a core may beindicated by the trending information of parameters that characterizethe core. As described, the updated profile information may correspondto an indication of a failed core or reduced maximum operation speed ofa core, for example. Based on this updated information, one or morecores of the many-core processor may be reconfigured (block 370).Furthermore, to efficiently provide communication between suchreconfigured cores, the interconnection fabric may also be reconfigured,in some embodiments (also block 370).

Reconfiguration of the cores may include reassignment of one or morecores to a fast bin, slow bin, spare bin or the like. The number of binsmay be tuned so as to obtain better control of the number of cores in aparticular bin. A core may be moved from the fast bin to the slow binover time due to degradation in its performance. When this occurs, anumber of options may be pursued to maintain the performance of theprocessor at its count level. A spare core may be added to the fast bin,or existing cores may be run at a higher frequency using voltage scalingor body bias adjustment techniques.

In another example, cores may be grouped into bins according to thelevel of their power consumption or according to other parameters suchas reliability parameters. Yet in another example, cores may be groupedinto different sets of bins: one set according to the operating speed;one according to the power consumption level; one according toreliability parameters; and so forth. In one embodiment, cores may begrouped into different sets of bins after dynamic profiles are built.The binning process may be performed by a software/firmware moduleembedded in the many-core processor. In another embodiment, the binningprocess may be performed when a task is received by an operating system(OS) so that cores may be grouped into bins according to the specifictask requirements.

As shown in FIG. 4, control may pass from block 370 back to diamond 350for continued normal operation of the many-core processor. Whiledescribed with this particular implementation in the embodiment of FIG.4, it is to be understood that the scope of the present invention is notso limited, and different manners of accessing non-volatile storage toobtain profile information and use such information inallocation/reallocation or configuration/reconfiguration operations ofthe many-core processor may be implemented.

However, it is to be understood that reconfiguration need notnecessarily occur when profile information is updated. For example, ifthe updated information indicates that a given core is no longeroperating at its maximum frequency, the operating voltage provided tothe core may be increased to obtain the same level of performance out ofthe core. Furthermore, instead of reconfiguring cores and/orinterconnection fabric, currently-running processes may be moved to oneor more different cores to attain a substantially similar level ofperformance without reconfiguration.

Thus in various embodiments, a resilient state of the many-coreprocessor may be stored in a non-volatile memory. Such a resilient statemay include profile information corresponding to the various cores, aswell as current configuration information, such as configuration of theinterconnection fabric, partitioning of the cores, voltage and frequencyoperation of the cores, and so forth. Such resilient state may be usedto enable power up and recovery from faults. For example, the resilientstate may be used to configure the many-core processor on power up andafter reset, as well as to reconfigure the many-core processor upon afault or other diminution in performance of one or more cores.

A range of non-volatile memory technologies may be implemented as thenon-volatile memory in different embodiments. In some embodiments, aflash memory may be used to record the resilient state of the many-coreprocessor. Such a flash memory may support block erase operations. Indifferent implementations, a flash memory may support various readmodes, as well as different programming modes. Depending upon thelocation of the non-volatile memory (e.g., on-chip or off-chip),security measures may be implemented in transferring information to andfrom the memory. For example, where the resilient state information isstored in an off-chip non-volatile memory, e.g., on a flash read onlymemory (ROM) device, the resilient state may be stored in an encryptedformat and may be transmitted to the many-core processor in an encryptedmanner.

Thus using embodiments of the present invention, state informationregarding the current profile and reliability of a many-core processormay be maintained, even during sleep and standby states, as well asother power management techniques. Of course, such information may alsobe maintained while power to the many-core processor is off via thenon-volatile memory. Using embodiments of the present invention, amany-core environment facing increased susceptibility to errors mayprovide reliable computing using resilient state information maintainedin non-volatile memory.

The interconnect fabric in a many-core processor (such as the one shownin FIGS. 1 and 2) may be reconfigurable so as to derive good benefitfrom each bin of cores. Since the membership of a core in a particularbin may change over time, the bandwidth and latency between cores issubject to wide fluctuation with a static fabric. Thus, theinterconnection fabric may be flexible and dynamically reconfigurable.When a mix of cores in the bins is changed, the available bandwidth andlatency across cores in a bin may be evaluated and the fabric may bereconfigured if necessary to maintain a high level of connectivity.While the physical location of the cores on the die may not change,switches that form the fabric may be reconfigured so that cores in thesame bin are in logical proximity to each other. The availability ofmultiple cores, the pool of spare cores and a high connectivity fabricenables quick recovery from faults with minimal performance degradation.As soon as a test identifies a problem with a particular core, that coremay be decommissioned and moved out of active service. A core from thespare pool may take its place. Accordingly, the interconnect fabric mayalso be reconfigured to mitigate the effect of the faulty core beingdropped from service, improving the ability of the processor to toleratefaults due to variation and degradation.

Embodiments may be implemented in many different system types. Referringnow to FIG. 5, shown is a block diagram of a multiprocessor system inaccordance with an embodiment of the present invention. As shown in FIG.5, the multiprocessor system is a point-to-point interconnect system,and includes a first processor 470 and a second processor 480 coupledvia a point-to-point interconnect 450. As shown in FIG. 5, each ofprocessors 470 and 480 may be multicore processors, including first andsecond processor cores (i.e., processor cores 474 a and 474 b andprocessor cores 484 a and 484 b). Each of processors 470 and 480 mayfurther include non-volatile memory to store resilient state dataregarding the cores of the corresponding processor. First processor 470further includes a memory controller hub (MCH) 472 and point-to-point(P-P) interfaces 476 and 478. Similarly, second processor 480 includes aMCH 482 and P-P interfaces 486 and 488. As shown in FIG. 5, MCH's 472and 482 couple the processors to respective memories, namely a memory432 and a memory 434, which may be portions of main memory locallyattached to the respective processors.

First processor 470 and second processor 480 may be coupled to a chipset490 via P-P interconnects 452 and 454, respectively. As shown in FIG. 5,chipset 490 includes P-P interfaces 494 and 498. Furthermore, chipset490 includes an interface 492 to couple chipset 490 with a highperformance graphics engine 438. In one embodiment, an Advanced GraphicsPort (AGP) bus 439 may be used to couple graphics engine 438 to chipset490. AGP bus 439 may conform to the Accelerated Graphics Port InterfaceSpecification, Revision 2.0, published May 4, 1998, by IntelCorporation, Santa Clara, Calif. Alternately, a point-to-pointinterconnect 439 may couple these components.

In turn, chipset 490 may be coupled to a first bus 416 via an interface496. In one embodiment, first bus 416 may be a Peripheral ComponentInterconnect (PCI) bus, as defined by the PCI Local Bus Specification,Production Version, Revision 2.1, dated June 1995 or a bus such as thePCI Express bus or another third generation input/output (I/O)interconnect bus, although the scope of the present invention is not solimited.

As shown in FIG. 5, various I/O devices 414 may be coupled to first bus416, along with a bus bridge 418 which couples first bus 416 to a secondbus 420. In one embodiment, second bus 420 may be a low pin count (LPC)bus. Various devices may be coupled to second bus 420 including, forexample, a keyboard/mouse 422, communication devices 426 and a datastorage unit 428 which may include code 430, in one embodiment. Datastorage unit 428, which may be a non-volatile storage such as a flashmemory, further may include resilient state data 432 to store resilientstate data for processors 470 and 480, in some embodiments. Further, anaudio I/O 424 may be coupled to second bus 420.

Embodiments may be implemented in code and may be stored on a storagemedium having stored thereon instructions which can be used to program asystem to perform the instructions. The storage medium may include, butis not limited to, any type of disk including floppy disks, opticaldisks, compact disk read-only memories (CD-ROMs), compact diskrewritables (CD-RWs), and magneto-optical disks, semiconductor devicessuch as read-only memories (ROMs), random access memories (RAMs) such asdynamic random access memories (DRAMs), static random access memories(SRAMs), erasable programmable read-only memories (EPROMs), flashmemories, electrically erasable programmable read-only memories(EEPROMs), magnetic or optical cards, or any other type of mediasuitable for storing electronic instructions.

While the present invention has been described with respect to a limitednumber of embodiments, those skilled in the art will appreciate numerousmodifications and variations therefrom. It is intended that the appendedclaims cover all such modifications and variations as fall within thetrue spirit and scope of this present invention.

1. A method comprising: performing dynamic testing of a many-coreprocessor including a plurality of cores; manipulating data obtainedfrom the dynamic testing into profile information regarding themany-core processor; and storing the profile information in anon-volatile memory.
 2. The method of claim 1, further comprisingpartitioning the plurality of cores into a plurality of performance binsbased on the dynamic testing, and storing bin information regarding thepartitioning in the non-volatile memory.
 3. The method of claim 1,wherein the profile information comprises static information regardingoperational parameters of each of the plurality of cores.
 4. The methodof claim 3, further comprising reconfiguring the many-core processorbased on updated profile information obtained after a change to at leastone of the operational parameters of at least one core.
 5. The method ofclaim 4, wherein reconfiguring the many-core processor comprisesreconfiguring an interconnect fabric coupling the plurality of coresbased on the updated profile information.
 6. The method of claim 1,further comprising: accessing the non-volatile memory to obtain theprofile information upon initialization of the many-core processor; andconfiguring the many-core processor using the profile information. 7.The method of claim 1, further comprising storing resilient stateinformation regarding the many-core processor in the non-volatilememory, the resilient state information including performance bininformation for each of the plurality of cores, task allocationinformation regarding one or more cores allocated to a task, andconfiguration information regarding an interconnect fabric that couplesthe plurality of cores.
 8. The method of claim 1, further comprisingstoring the profile information in the non-volatile memory, wherein thenon-volatile memory is located on a die of the many-core processor. 9.An article comprising a machine-readable storage medium includinginstructions that if executed by a machine enable the machine to performa method comprising: accessing a non-volatile memory to obtain profileinformation of a many-core processor; enabling a plurality of cores ofthe many-core processor based on the profile information; andconfiguring an interconnection fabric of the many-core processor basedon the profile information to couple the enabled plurality of cores. 10.The article of claim 9, wherein the method further comprisesself-testing the many-core processor to determine functional correctnessof the enabled plurality of cores.
 11. The article of claim 10, whereinthe method further comprises: disabling one of the enabled plurality ofcores after the self-testing and enabling another of the plurality ofcores; and updating the profile information in the non-volatile memorybased on the disabling and the enabling.
 12. The article of claim 11,wherein the method further comprises reconfiguring the interconnectionfabric based on the disabling and enabling and updating the profileinformation in the non-volatile memory based on the reconfiguredinterconnection fabric.
 13. A system comprising: a many-core processorincluding a plurality of cores and a non-volatile memory to storeresilient state information regarding the plurality of cores, whereinthe many-core processor is to access the resilient state information toconfigure one or more of the plurality of cores for operation; and adynamic random access memory (DRAM) coupled to the many-core processor.14. The system of claim 13, wherein the many-core processor is toperform dynamic self-testing and to update the resilient stateinformation based on the dynamic self-testing.
 15. The system of claim14, wherein the system is to reconfigure the many-core processor basedon the updated resilient state information.
 16. The system of claim 13,wherein the resilient state information comprises performance bininformation for each of the plurality of cores, task allocationinformation regarding one or more cores allocated to a task, andconfiguration information regarding an interconnect fabric that couplesthe plurality of cores.
 17. The system of claim 13, wherein the DRAM andthe many-core processor are located on a single die, the DRAM comprisinga shared memory for the plurality of cores.